Optimal Sequential Exploration: Bandits, Clairvoyants, and Wildcats (Online Appendix)

نویسندگان

  • David B. Brown
  • James E. Smith
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimal Sequential Exploration: Bandits, Clairvoyants, and Wildcats

This paper was motivated by the problem of developing an optimal strategy for exploring a large oil and gas field in the North Sea. Where should we drill first? Where do we drill next? The problem resembles a classical multiarmed bandit problem, but probabilistic dependence plays a key role: outcomes at drilled sites reveal information about neighboring targets. Good exploration strategies will...

متن کامل

Correlational Dueling Bandits with Application to Clinical Treatment in Large Decision Spaces

We consider sequential decision making under uncertainty, where the goal is to optimize over a large decision space using noisy comparative feedback. This problem can be formulated as a Karmed Dueling Bandits problem where K is the total number of decisions. When K is very large, existing dueling bandits algorithms suffer huge cumulative regret before converging on the optimal arm. This paper s...

متن کامل

Exploration-Free Policies in Dynamic Pricing and Online Decision-Making

Growing availability of data has enabled practitioners to tailor decisions at the individuallevel. This involves learning a model of decision outcomes conditional on individual-specific covariates or features. Recently, contextual bandits have been introduced as a framework to study these online and sequential decision making problems. This literature predominantly focuses on algorithms that ba...

متن کامل

Exponentiated Gradient LINUCB for Contextual Multi-Armed Bandits

We present Exponentiated Gradient LINUCB, an algorithm for contextual multi-armed bandits. This algorithm uses Exponentiated Gradient to find the optimal exploration of the LINUCB. Within a deliberately designed offline simulation framework we conduct evaluations with real online event log data. The experimental results demonstrate that our algorithm outperforms surveyed algorithms.

متن کامل

An Optimal Algorithm for Linear Bandits

We provide the first algorithm for online bandit linear optimization whose regret after T rounds is of order √ Td lnN on any finite class X ⊆ R of N actions, and of order d √ T (up to log factors) when X is infinite. These bounds are not improvable in general. The basic idea utilizes tools from convex geometry to construct what is essentially an optimal exploration basis. We also present an app...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013